Here, we are investigating how weather affects street-level crime in Colchester during 2024. The main goal is to find out if things like temperature, rain, and wind are related to how often crimes happen and what types of crimes occur. We use two datasets: one with crime reports and one with daily weather data.
First, we clean both datasets by fixing missing values, removing unnecessary columns, and making sure the formats match. Then, we group the crime data by month so it can be compared with the weather data, which we also average by month. This helps us combine the two datasets in a meaningful way.
Next, we use different kinds of charts to explore the data. Bar charts and tables show how many crimes happened and what types they were. Pie charts and dot plots show more about the categories of crimes. Histograms and density plots show how weather data is spread out. We also use more advanced visuals like smoothed time series, scatter plots, and correlation charts to find trends over time and how different factors are connected. These visual tools help us discover patterns and can be useful for planning public safety policies in the future.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(ggplot2)
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.4.3
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.4.3
## corrplot 0.95 loaded
library(DT)
## Warning: package 'DT' was built under R version 4.4.3
library(naniar)
## Warning: package 'naniar' was built under R version 4.4.3
We’re using two main datasets in this project: crime24.csv, which has street-level crime data for Colchester in 2024, and temp24.csv, which has daily weather information. To keep things consistent, we renamed the “Date” column in the weather data to lowercase “date” so it matches with the crime data. We then used the head() function to look at the first few rows of each dataset. This gives us a quick idea of what the data looks like and helps us check that everything loaded correctly.
# Loading datasets
crime <- read_csv("crime24.csv")
## New names:
## Rows: 6304 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (7): category, persistent_id, date, street_name, location_type, location... dbl
## (5): ...1, lat, long, street_id, id lgl (1): context
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
temp <- read_csv("temp24.csv")
## Rows: 366 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): WindkmhDir
## dbl (15): station_ID, TemperatureCAvg, TemperatureCMax, TemperatureCMin, Td...
## lgl (1): PreselevHp
## date (1): Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Renaming 'Date' to 'date' in temp
temp <- temp %>% rename(date = Date)
# This is how we will observe the structure
head(crime)
head(temp)
Summary statistics and missing data checks were conducted to better understand the structure and quality of the datasets. The summary() function was used to explore variable types, ranges, and typical values, which is critical before moving on to data cleaning and visualization. To identify missing data, colSums(is.na()) was applied to both datasets.
# Let's look at the summary for each dataset
summary(crime)
## ...1 category persistent_id date
## Min. : 1 Length:6304 Length:6304 Length:6304
## 1st Qu.:1577 Class :character Class :character Class :character
## Median :3152 Mode :character Mode :character Mode :character
## Mean :3152
## 3rd Qu.:4728
## Max. :6304
## lat long street_id street_name
## Min. :51.88 Min. :0.8788 Min. :2152686 Length:6304
## 1st Qu.:51.89 1st Qu.:0.8966 1st Qu.:2153025 Class :character
## Median :51.89 Median :0.9013 Median :2153155 Mode :character
## Mean :51.89 Mean :0.9029 Mean :2153873
## 3rd Qu.:51.89 3rd Qu.:0.9088 3rd Qu.:2153366
## Max. :51.90 Max. :0.9246 Max. :2343256
## context id location_type location_subtype
## Mode:logical Min. :115954844 Length:6304 Length:6304
## NA's:6304 1st Qu.:118009952 Class :character Class :character
## Median :120228058 Mode :character Mode :character
## Mean :120403000
## 3rd Qu.:122339060
## Max. :125550731
## outcome_status
## Length:6304
## Class :character
## Mode :character
##
##
##
summary(temp)
## station_ID date TemperatureCAvg TemperatureCMax
## Min. :3590 Min. :2024-01-01 Min. :-2.60 Min. : 1.10
## 1st Qu.:3590 1st Qu.:2024-04-01 1st Qu.: 7.00 1st Qu.:10.72
## Median :3590 Median :2024-07-01 Median :10.95 Median :14.75
## Mean :3590 Mean :2024-07-01 Mean :10.98 Mean :15.08
## 3rd Qu.:3590 3rd Qu.:2024-09-30 3rd Qu.:14.50 3rd Qu.:19.60
## Max. :3590 Max. :2024-12-31 Max. :23.10 Max. :29.80
##
## TemperatureCMin TdAvgC HrAvg WindkmhDir
## Min. :-6.100 Min. :-6.000 Min. :59.60 Length:366
## 1st Qu.: 3.325 1st Qu.: 4.725 1st Qu.:75.90 Class :character
## Median : 6.800 Median : 8.200 Median :82.75 Mode :character
## Mean : 6.486 Mean : 7.752 Mean :81.74
## 3rd Qu.: 9.500 3rd Qu.:11.000 3rd Qu.:88.80
## Max. :16.700 Max. :16.900 Max. :98.60
##
## WindkmhInt WindkmhGust PresslevHp Precmm
## Min. : 3.90 Min. : 11.10 Min. : 978.9 Min. : 0.000
## 1st Qu.:12.22 1st Qu.: 31.50 1st Qu.:1007.5 1st Qu.: 0.000
## Median :15.80 Median : 38.90 Median :1013.8 Median : 0.200
## Mean :16.52 Mean : 40.81 Mean :1013.7 Mean : 1.864
## 3rd Qu.:19.80 3rd Qu.: 48.20 3rd Qu.:1021.0 3rd Qu.: 1.600
## Max. :42.50 Max. :105.60 Max. :1037.3 Max. :38.000
## NA's :24
## TotClOct lowClOct SunD1h VisKm
## Min. :0.000 Min. :1.000 Min. : 0.000 Min. : 0.10
## 1st Qu.:3.800 1st Qu.:5.800 1st Qu.: 0.325 1st Qu.:20.73
## Median :5.600 Median :6.900 Median : 3.500 Median :30.95
## Mean :5.304 Mean :6.609 Mean : 4.203 Mean :31.42
## 3rd Qu.:7.200 3rd Qu.:7.600 3rd Qu.: 7.100 3rd Qu.:41.20
## Max. :8.000 Max. :8.000 Max. :15.600 Max. :71.20
## NA's :5
## SnowDepcm PreselevHp
## Min. :1.00 Mode:logical
## 1st Qu.:1.25 NA's:366
## Median :1.50
## Mean :1.50
## 3rd Qu.:1.75
## Max. :2.00
## NA's :364
# Checking for NA values
colSums(is.na(crime))
## ...1 category persistent_id date
## 0 0 732 0
## lat long street_id street_name
## 0 0 0 0
## context id location_type location_subtype
## 6304 0 0 6282
## outcome_status
## 710
colSums(is.na(temp))
## station_ID date TemperatureCAvg TemperatureCMax TemperatureCMin
## 0 0 0 0 0
## TdAvgC HrAvg WindkmhDir WindkmhInt WindkmhGust
## 0 0 0 0 0
## PresslevHp Precmm TotClOct lowClOct SunD1h
## 0 24 0 5 0
## VisKm SnowDepcm PreselevHp
## 0 364 366
# Missing Data visualisation
gg_miss_var(crime) + ggtitle("Missing Data in Crime Dataset")
gg_miss_var(temp) + ggtitle("Missing Data in Temperature Dataset")
In the crime data, outcome_status had 710 missing values, while context and location_subtype were entirely missing. In the temperature data, Precmm had 24 missing entries, lowClOct had 5, and both SnowDepcm and PreselevHp were largely or completely missing.
To visualize missingness, the gg_miss_var() function from the naniar package was used. This created clear plots showing the proportion of missing data across variables, confirming earlier findings and helping to decide which fields should be dropped, imputed, or retained for analysis.
The data was cleaned by removing columns that were either unnecessary or lacked sufficient information. In the crime dataset, variables like context and location_subtype were excluded because they contained either fully missing or nearly empty entries. Likewise, the temperature dataset had columns such as PreselevHp, which was entirely missing, and SnowDepcm, which had too little usable data — both were dropped to minimize noise and focus on meaningful variables.
# Droping context and location_subtype Columns
crime_clean <- crime %>%
select(-context, -location_subtype, -persistent_id )
temp_clean <- temp %>%
select(-PreselevHp, -SnowDepcm)
# Replacing NA outcome_status with "Unknown" in crime file
crime_clean <- crime_clean %>%
mutate(outcome_status = ifelse(is.na(outcome_status), "Unknown", outcome_status))
# Filling NA in Precmm with 0, and mean-impute lowClOct in temp file
temp_clean <- temp_clean %>%
mutate(
Precmm = ifelse(is.na(Precmm), 0, Precmm),
lowClOct = ifelse(is.na(lowClOct), mean(lowClOct, na.rm = TRUE), lowClOct)
)
# Making sure there are no missing values
sapply(crime_clean, function(x) sum(is.na(x)))
## ...1 category date lat long
## 0 0 0 0 0
## street_id street_name id location_type outcome_status
## 0 0 0 0 0
sapply(temp_clean, function(x) sum(is.na(x)))
## station_ID date TemperatureCAvg TemperatureCMax TemperatureCMin
## 0 0 0 0 0
## TdAvgC HrAvg WindkmhDir WindkmhInt WindkmhGust
## 0 0 0 0 0
## PresslevHp Precmm TotClOct lowClOct SunD1h
## 0 0 0 0 0
## VisKm
## 0
colnames(temp_clean)
## [1] "station_ID" "date" "TemperatureCAvg" "TemperatureCMax"
## [5] "TemperatureCMin" "TdAvgC" "HrAvg" "WindkmhDir"
## [9] "WindkmhInt" "WindkmhGust" "PresslevHp" "Precmm"
## [13] "TotClOct" "lowClOct" "SunD1h" "VisKm"
Remaining missing values were then handled: in the crime data, outcome_status NAs were replaced with “Unknown” to retain those records. In the temperature data, missing rainfall (Precmm) was assumed to be zero, while missing values in lowClOct were imputed using the mean.
A final check confirmed that the cleaned datasets were complete and ready for further processing. crime_clean had no remaining missing values, and temp_clean had all relevant numeric fields filled. This comprehensive cleaning ensures the datasets are suitable for accurate aggregation, merging, and visualization in the next steps.
Additionally, the persistent_id column, which could be used for longitudinal tracking of crime records, was dropped due to high sparsity. Since our analysis focuses on aggregation and visualization for a single time period, this identifier was not essential.
Crime and weather data operated on different time scales—monthly for crime and daily for weather—so the weather data was aggregated to a monthly level. This adjustment allowed for a more meaningful comparison between the two datasets, reducing short-term fluctuations and highlighting broader seasonal trends.
We focused on three weather variables: average temperature, total rainfall, and average wind speed. These influence human behavior and may impact crime. For instance, warm weather can lead to more public activity (and more opportunities for crime), while rain may keep people indoors. Wind affects comfort and visibility, possibly influencing both criminal behavior and police response.
# Treating it as a string, and just assign to 'month'
crime_clean <- crime %>%
select(-context, -location_subtype) %>%
mutate(
outcome_status = ifelse(is.na(outcome_status), "Unknown", outcome_status),
month = date # date is already in "YYYY-MM" format
)
# Grouping by month
crime_monthly <- crime_clean %>%
group_by(month) %>%
summarise(crime_count = n())
# Ensuring character type
crime_monthly$month <- as.character(crime_monthly$month)
# Creating temp_monthly by summarising weather data per month
temp_monthly <- temp_clean %>%
mutate(month = format(date, "%Y-%m")) %>%
group_by(month) %>%
summarise(
avg_temp = mean(TemperatureCAvg, na.rm = TRUE),
total_rain = sum(Precmm, na.rm = TRUE),
avg_wind = mean(WindkmhInt, na.rm = TRUE) # ✅ Corrected here
)
# Merging with crime_monthly and temp_monthly
merged_monthly <- left_join(crime_monthly, temp_monthly, by = "month")
# View it
head(merged_monthly)
We began the analysis by summarizing the frequency of each crime type using a one-way table and then cross-tabulated those with their respective outcome statuses in a two-way table. This approach revealed not only the most common types of crime but also shed light on how often they were resolved—or left unresolved—by the authorities.
# One-way frequency table of crime types
crime_table <- crime_clean %>%
count(category, sort = TRUE)
datatable(crime_table, options = list(pageLength = 15), caption = "Frequency of Crime Types")
# Two-way table: Crime category by outcome status
crime_2way <- table(crime_clean$category, crime_clean$outcome_status)
library(knitr)
kable(crime_2way, caption = "Crime Type by Outcome Status")
| Action to be taken by another organisation | Awaiting court outcome | Court result unavailable | Formal action is not in the public interest | Further action is not in the public interest | Further investigation is not in the public interest | Investigation complete; no suspect identified | Local resolution | Offender given a caution | Status update unavailable | Suspect charged as part of another case | Unable to prosecute suspect | Under investigation | Unknown | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| anti-social-behaviour | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 710 |
| bicycle-theft | 3 | 3 | 2 | 0 | 0 | 0 | 113 | 0 | 0 | 2 | 0 | 16 | 10 | 0 |
| burglary | 1 | 10 | 6 | 0 | 0 | 0 | 108 | 0 | 0 | 7 | 0 | 26 | 13 | 0 |
| criminal-damage-arson | 6 | 28 | 34 | 0 | 1 | 0 | 246 | 9 | 11 | 10 | 0 | 110 | 24 | 0 |
| drugs | 2 | 13 | 19 | 7 | 4 | 0 | 18 | 118 | 16 | 9 | 0 | 16 | 43 | 0 |
| other-crime | 4 | 7 | 12 | 2 | 1 | 1 | 14 | 1 | 1 | 5 | 0 | 43 | 9 | 0 |
| other-theft | 1 | 6 | 5 | 2 | 0 | 0 | 283 | 1 | 0 | 12 | 1 | 74 | 27 | 0 |
| possession-of-weapons | 2 | 8 | 9 | 0 | 1 | 0 | 6 | 4 | 4 | 8 | 0 | 16 | 7 | 0 |
| public-order | 8 | 20 | 20 | 11 | 0 | 0 | 165 | 12 | 3 | 27 | 0 | 160 | 32 | 0 |
| robbery | 0 | 7 | 4 | 1 | 0 | 0 | 41 | 0 | 0 | 4 | 0 | 25 | 3 | 0 |
| shoplifting | 5 | 82 | 65 | 2 | 0 | 0 | 313 | 25 | 1 | 7 | 0 | 92 | 37 | 0 |
| theft-from-the-person | 2 | 2 | 0 | 0 | 0 | 0 | 66 | 0 | 0 | 2 | 0 | 14 | 5 | 0 |
| vehicle-crime | 1 | 7 | 6 | 0 | 0 | 0 | 208 | 0 | 0 | 3 | 0 | 33 | 12 | 0 |
| violent-crime | 84 | 96 | 107 | 12 | 7 | 1 | 446 | 31 | 20 | 141 | 0 | 1195 | 280 | 0 |
crime_2way
##
## Action to be taken by another organisation
## anti-social-behaviour 0
## bicycle-theft 3
## burglary 1
## criminal-damage-arson 6
## drugs 2
## other-crime 4
## other-theft 1
## possession-of-weapons 2
## public-order 8
## robbery 0
## shoplifting 5
## theft-from-the-person 2
## vehicle-crime 1
## violent-crime 84
##
## Awaiting court outcome Court result unavailable
## anti-social-behaviour 0 0
## bicycle-theft 3 2
## burglary 10 6
## criminal-damage-arson 28 34
## drugs 13 19
## other-crime 7 12
## other-theft 6 5
## possession-of-weapons 8 9
## public-order 20 20
## robbery 7 4
## shoplifting 82 65
## theft-from-the-person 2 0
## vehicle-crime 7 6
## violent-crime 96 107
##
## Formal action is not in the public interest
## anti-social-behaviour 0
## bicycle-theft 0
## burglary 0
## criminal-damage-arson 0
## drugs 7
## other-crime 2
## other-theft 2
## possession-of-weapons 0
## public-order 11
## robbery 1
## shoplifting 2
## theft-from-the-person 0
## vehicle-crime 0
## violent-crime 12
##
## Further action is not in the public interest
## anti-social-behaviour 0
## bicycle-theft 0
## burglary 0
## criminal-damage-arson 1
## drugs 4
## other-crime 1
## other-theft 0
## possession-of-weapons 1
## public-order 0
## robbery 0
## shoplifting 0
## theft-from-the-person 0
## vehicle-crime 0
## violent-crime 7
##
## Further investigation is not in the public interest
## anti-social-behaviour 0
## bicycle-theft 0
## burglary 0
## criminal-damage-arson 0
## drugs 0
## other-crime 1
## other-theft 0
## possession-of-weapons 0
## public-order 0
## robbery 0
## shoplifting 0
## theft-from-the-person 0
## vehicle-crime 0
## violent-crime 1
##
## Investigation complete; no suspect identified
## anti-social-behaviour 0
## bicycle-theft 113
## burglary 108
## criminal-damage-arson 246
## drugs 18
## other-crime 14
## other-theft 283
## possession-of-weapons 6
## public-order 165
## robbery 41
## shoplifting 313
## theft-from-the-person 66
## vehicle-crime 208
## violent-crime 446
##
## Local resolution Offender given a caution
## anti-social-behaviour 0 0
## bicycle-theft 0 0
## burglary 0 0
## criminal-damage-arson 9 11
## drugs 118 16
## other-crime 1 1
## other-theft 1 0
## possession-of-weapons 4 4
## public-order 12 3
## robbery 0 0
## shoplifting 25 1
## theft-from-the-person 0 0
## vehicle-crime 0 0
## violent-crime 31 20
##
## Status update unavailable
## anti-social-behaviour 0
## bicycle-theft 2
## burglary 7
## criminal-damage-arson 10
## drugs 9
## other-crime 5
## other-theft 12
## possession-of-weapons 8
## public-order 27
## robbery 4
## shoplifting 7
## theft-from-the-person 2
## vehicle-crime 3
## violent-crime 141
##
## Suspect charged as part of another case
## anti-social-behaviour 0
## bicycle-theft 0
## burglary 0
## criminal-damage-arson 0
## drugs 0
## other-crime 0
## other-theft 1
## possession-of-weapons 0
## public-order 0
## robbery 0
## shoplifting 0
## theft-from-the-person 0
## vehicle-crime 0
## violent-crime 0
##
## Unable to prosecute suspect Under investigation Unknown
## anti-social-behaviour 0 0 710
## bicycle-theft 16 10 0
## burglary 26 13 0
## criminal-damage-arson 110 24 0
## drugs 16 43 0
## other-crime 43 9 0
## other-theft 74 27 0
## possession-of-weapons 16 7 0
## public-order 160 32 0
## robbery 25 3 0
## shoplifting 92 37 0
## theft-from-the-person 14 5 0
## vehicle-crime 33 12 0
## violent-crime 1195 280 0
Violent crime emerges as the most common category, with shoplifting, other theft, and criminal damage/arson following. Notably, 1,195 violent crimes were marked ‘unable to prosecute suspect’ and 446 as ‘no suspect identified’, highlighting investigative challenges.
Drug offences often result in clear outcomes—cautions, formal actions, or local resolutions—suggesting easier prosecution. Weapon possession and public order crimes also show higher resolution rates.
In contrast, all 710 anti-social behaviour cases are marked “Unknown,” indicating widespread underreporting or classification gaps, and raising concerns about how these incidents are tracked and addressed.
We visualized the frequency of each category using three distinct yet complementary methods: a bar plot, a pie chart, and a dot plot to better understand the distribution of crime categories in Colchester during 2024. These plots all used the same data but offered different lenses through which to interpret it.
# Bar plot of crime categories
crime_clean %>%
count(category, sort = TRUE) %>%
ggplot(aes(x = fct_reorder(category, n), y = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Frequency of Crime Categories",
x = "Crime Category", y = "Count")
# Pie Chart
crime_pie <- crime_clean %>%
count(category)
my_colors <- c(
"anti-social-behaviour" = "purple",
"bicycle-theft" = "blue",
"burglary" = "darkgreen",
"criminal-damage-arson" = "orange",
"drugs" = "red",
"other-crime" = "darkmagenta",
"other-theft" = "skyblue",
"possession-of-weapons" = "brown",
"public-order" = "gold",
"robbery" = "cyan",
"shoplifting" = "darkred",
"theft-from-the-person" = "navy",
"vehicle-crime" = "grey30",
"violent-crime" = "black"
)
ggplot(crime_pie, aes(x = "", y = n, fill = category)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y") +
scale_fill_manual(values = my_colors) +
theme_void() +
labs(title = "Crime Category Proportions (Pie Chart)")
# Dot Plot – Alternative to Bar
crime_dot <- crime_clean %>%
count(category, sort = TRUE)
ggplot(crime_dot, aes(x = reorder(category, n), y = n)) +
geom_point(color = "darkred", size = 3) +
coord_flip() +
labs(title = "Dot Plot of Crime Categories",
x = "Crime Category", y = "Frequency")
The bar plot highlights violent crime as the most frequent, far surpassing other types like anti-social behaviour, shoplifting, and criminal damage/arson. These four dominate the overall crime pattern, with the bar plot enabling clear comparisons.
The pie chart, though less precise, emphasizes proportions—most notably the large black segment for violent crime. Custom colors enhance clarity and contrast.
The dot plot offers a clean, position-based view of frequencies. It supports earlier insights, showing violent crime as most common, while crimes like weapon possession, robbery, and personal theft rank lowest.
Let us look into the distribution of key weather variables—average daily temperature and precipitation—using histograms and density plots. These visualizations provide insight into the central tendencies, variability, and skewness of weather conditions in Colchester during 2024.
# Histogram of Daily Average Temperature
ggplot(temp_clean, aes(x = TemperatureCAvg)) +
geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
labs(title = "Histogram of Daily Average Temperature",
x = "Average Temperature (\u00B0C)", y = "Frequency") +
theme_minimal()
# Density Plot of Daily Rainfall (Precmm)
ggplot(temp_clean, aes(x = Precmm)) +
geom_density(fill = "tomato", alpha = 0.6) +
labs(title = "Density Plot of Daily Rainfall",
x = "Precipitation (mm)", y = "Density") +
theme_minimal()
# Compare Temperature Distribution by Month
# Extract month from date
temp_clean <- temp_clean %>%
mutate(month = format(date, "%Y-%m"))
# Monthly temperature density plot
ggplot(temp_clean, aes(x = TemperatureCAvg, fill = month)) +
geom_density(alpha = 0.4) +
labs(title = "Monthly Temperature Distribution (Density Plot)",
x = "Average Temperature (deg C)", # <-- Safe replacement
y = "Density") +
theme_minimal()
The daily temperature histogram shows a bell-shaped curve centered around 10–15°C, suggesting mild weather is typical in Colchester. Few days fall below 5°C or above 20°C, reflecting a temperate climate.
Rainfall density is sharply right-skewed, with most days seeing little to no rain and a few showing heavy rainfall. This pattern highlights the dominance of dry days and occasional extremes.
The monthly temperature density plot reveals seasonal shifts—warmer days cluster in summer, colder ones in early and late months. These trends offer useful context for exploring seasonal links to crime rates.
Average daily temperatures across the year were visualized using box plots, violin plots, and sina plots. Each of these visualization methods offers a unique lens on the data, allowing us to explore overall trends, variability, and outliers with greater clarity and depth.
# Box Plot of Temperature by Month
ggplot(temp_clean, aes(x = month, y = TemperatureCAvg)) +
geom_boxplot(fill = "lightblue") +
labs(title = "Box Plot of Monthly Temperature",
x = "Month", y = "Average Temperature (deg C)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Violin Plot of Temperature by Month
ggplot(temp_clean, aes(x = month, y = TemperatureCAvg)) +
geom_violin(fill = "orchid", alpha = 0.7) +
labs(title = "Violin Plot of Monthly Temperature",
x = "Month", y = "Average Temperature (deg C)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Sina Plot (if ggforce is installed)
library(ggforce)
## Warning: package 'ggforce' was built under R version 4.4.3
ggplot(temp_clean, aes(x = month, y = TemperatureCAvg)) +
geom_sina(fill = "steelblue", alpha = 0.5) +
labs(title = "Sina Plot of Monthly Temperature",
x = "Month", y = "Average Temperature (deg C)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The box plot shows seasonal temperature trends—cold in winter, warm from
May to September—with higher variability in April and October. Outliers,
especially in spring and autumn, suggest occasional temperature
extremes.
The violin plot expands on this by displaying full distributions. Summer months like July and August show tight clusters around 18–20°C, while winter months have wider spreads, highlighting more variable cold conditions.
The sina plot adds detail by showing each temperature data point. It reveals clustering, gaps, and density patterns across months, providing a granular view that reinforces seasonal trends.
#Box Plot of Rainfall by Month
ggplot(temp_clean, aes(x = month, y = Precmm)) +
geom_boxplot(fill = "lightgreen") +
coord_cartesian(ylim = c(0, 10)) +
labs(title = "Box Plot of Monthly Rainfall (Zoomed In)",
x = "Month", y = "Rainfall (mm)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#Violin Plot of Rainfall by Month
ggplot(temp_clean, aes(x = month, y = Precmm)) +
geom_violin(fill = "darkturquoise", alpha = 0.6) +
coord_cartesian(ylim = c(0, 10)) +
labs(title = "Violin Plot of Monthly Rainfall (Zoomed In)",
x = "Month", y = "Rainfall (mm)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
We used box and violin plots to explore monthly rainfall, focusing on the 0–10 mm range with coord_cartesian() to better show common values.
Box plots reveal low median rainfall—often below 2 mm—across months, with greater variability and more outliers from January to May, indicating occasional heavy rain.
The violin plots show rainfall clustered near zero, but wider shapes in early spring reflect a broader range of outcomes. Narrower violins in summer highlight consistently dry conditions, reinforcing seasonal rainfall patterns.
We explored the potential relationship between weather conditions and crime levels in Colchester by creating scatter plots and a pair plot using monthly-aggregated data. These visualizations made it easier to spot patterns or correlations that might be missed when relying solely on summary statistics.
# Scatter Plot: Crime vs. Temperature
ggplot(merged_monthly, aes(x = avg_temp, y = crime_count)) +
geom_point(color = "darkblue", size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Crime Count vs. Average Temperature",
x = "Average Temperature (deg C)", y = "Monthly Crime Count") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Scatter Plot: Crime vs. Rainfall
ggplot(merged_monthly, aes(x = total_rain, y = crime_count)) +
geom_point(color = "purple", size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "orange") +
labs(title = "Crime Count vs. Total Monthly Rainfall",
x = "Total Rainfall (mm)", y = "Monthly Crime Count") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#Pair Plot (Multiple Variables at Once)
library(GGally)
## Warning: package 'GGally' was built under R version 4.4.3
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
# Select relevant columns
ggpairs(merged_monthly[, c("crime_count", "avg_temp", "total_rain", "avg_wind")],
title = "Pair Plot of Crime and Weather Variables")
The scatter plot of crime vs. temperature shows a slight positive trend—crime tends to rise with warmer weather. This aligns with theories linking outdoor activity and social interaction in warmer months to increased crime.
Surprisingly, crime also rises with rainfall, as seen in a second scatter plot. While we’d expect rain to deter crime, this trend may be driven by a few high-rainfall, high-crime months and could reflect underlying factors needing further analysis.
The pair plot provides a comprehensive look at all numerical variables—crime_count, avg_temp, total_rain, and avg_wind. Key takeaways include:
A moderate positive correlation between temperature and crime (r = 0.422)
A slightly stronger correlation between rainfall and crime (r = 0.575)
A negative correlation between crime and wind speed (r = -0.452)
The strongest inverse relationship in the matrix is between wind speed and temperature (r = -0.634)
Now, we explore how weather variables—namely temperature, rainfall, and wind speed—correlate with crime levels in Colchester over the year 2024. Using both correlation matrices and time series plots, we aim to identify any consistent patterns or associations that might suggest causality or seasonality.
# Select numeric columns
cor_data <- merged_monthly %>%
select(where(is.numeric))
# Calculate correlation matrix
cor_matrix <- cor(cor_data, use = "complete.obs")
# Basic circular correlation plot
library(corrplot)
corrplot(cor_matrix, method = "circle", type = "upper",
tl.cex = 0.9, addCoef.col = "black", number.cex = 0.7)
library(ggcorrplot)
ggcorrplot(cor_matrix, lab = TRUE, type = "lower",
colors = c("darkred", "white", "darkgreen"),
title = "Correlation Matrix of Crime and Weather Variables")
Correlation analysis using corrplot and ggcorrplot shows a moderate positive link between temperature and crime (r = 0.42), supporting the idea that warmer weather may encourage activity—and thus crime.
Rainfall also correlates positively with crime (r = 0.58), which may reflect specific periods of high activity despite rain. Wind speed, however, shows a moderate negative correlation (r = -0.45), suggesting it might deter crime by limiting outdoor movement.
The time series analysis, enhanced with smoothing techniques, provides a clear view of how crime counts and weather variables evolve month by month throughout 2024.
# Counting Crime Over Time
ggplot(merged_monthly, aes(x = month, y = crime_count, group = 1)) +
geom_line(color = "steelblue", linewidth = 1.2) +
geom_point(color = "black", size = 2) +
geom_smooth(se = FALSE, color = "red", method = "loess") +
labs(title = "Monthly Crime Trend in 2024",
x = "Month", y = "Crime Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## `geom_smooth()` using formula = 'y ~ x'
# Average Temperature Over Time
ggplot(merged_monthly, aes(x = month, y = avg_temp, group = 1)) +
geom_line(color = "darkgreen", linewidth = 1.2) +
geom_point(color = "black", size = 2) +
geom_smooth(se = FALSE, color = "blue", method = "loess") +
labs(title = "Monthly Average Temperature Trend",
x = "Month", y = "Temperature (deg C)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## `geom_smooth()` using formula = 'y ~ x'
# Rainfall Over Time
ggplot(merged_monthly, aes(x = month, y = total_rain, group = 1)) +
geom_line(color = "dodgerblue3", linewidth = 1.2) +
geom_point(color = "black", size = 2) +
geom_smooth(se = FALSE, color = "darkorange", method = "loess") +
labs(title = "Monthly Rainfall Trend",
x = "Month", y = "Total Rainfall (mm)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## `geom_smooth()` using formula = 'y ~ x'
Time series plots show clear seasonal trends in crime and weather. Crime peaks in July and September, closely mirroring rising temperatures, which peak in August—supporting the idea that warmer weather may elevate crime due to increased activity.
Rainfall peaks in February and May but doesn’t appear to impact crime levels significantly, suggesting its influence may be minimal or time-dependent.
Temperature follows a smooth seasonal curve, offering context for comparing crime patterns. These trends highlight the value of weather data in anticipating crime surges and improving resource planning for public safety.
We used the leaflet package to build an interactive map based on geographic coordinates from the crime dataset, offering a clearer view of how crime is distributed across Colchester. To make the analysis more engaging and easier to explore, two of the most relevant visualizations were also converted into interactive formats using Plotly.
Each red dot represents an individual crime location, plotted using latitude and longitude. Users can hover to view crime type and outcome status, offering detailed, location-specific insights.
This geospatial visualization reveals clear clustering of criminal activity in the town center, especially around the High Street and surrounding urban areas. Outskirts such as the southern residential zones show far fewer incidents. Such clustering supports targeted policing and resource deployment by highlighting high-crime zones visually and interactively.
# Load the leaflet map using cleaned crime data
leaflet(data = crime_clean) %>%
addTiles() %>% # Adding default OpenStreetMap tiles
addCircleMarkers(
lng = ~long, # Adding Longitude from the dataset
lat = ~lat, # Adding Latitude from the dataset
popup = ~paste("Crime:", category, "<br>", # Popup info: crime category
"Outcome:", outcome_status), # and outcome status
radius = 2, # Small circle markers for clarity
color = "red", # Red color for markers
fillOpacity = 0.7 # Adding slight transparency to reduce overlap visibility
) %>%
addScaleBar(position = "bottomleft") %>% # Adding a scale bar to bottom-left
setView( # Center the map view
lng = mean(crime_clean$long, na.rm = TRUE), # Mean longitude
lat = mean(crime_clean$lat, na.rm = TRUE), # Mean latitude
zoom = 12 # Default zoom level
)
# The above map can look cluttered when many crimes are located close together.
# Let's now visualize the same map using **marker clustering** to handle overlap.
leaflet(data = crime_clean) %>%
addTiles() %>% # Adding default OpenStreetMap tiles
addMarkers(
lng = ~long, # Longitude from the dataset
lat = ~lat, # Latitude from the dataset
clusterOptions = markerClusterOptions(), # Enabling automatic clustering
popup = ~paste("Crime:", category, "<br>", # Popup info
"Outcome:", outcome_status)
) %>%
setView(
lng = mean(crime_clean$long, na.rm = TRUE),
lat = mean(crime_clean$lat, na.rm = TRUE),
zoom = 12
)
The first is a scatter plot showing the relationship between crime count and average temperature. In its interactive form, users can hover over individual points to view exact values, zoom into specific temperature ranges, and visually explore the positive correlation between warmer conditions and higher crime rates.
# Interactive Scatter Plot: Crime vs. Temperature
library(plotly)
# Creating plotly object from ggplot
p1 <- ggplot(merged_monthly, aes(x = avg_temp, y = crime_count)) +
geom_point(color = "darkred", size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(title = "Interactive: Crime Count vs. Avg Temperature",
x = "Average Temperature (deg C)", y = "Crime Count") +
theme_minimal()
# Converting to plotly interactive plot
ggplotly(p1)
## `geom_smooth()` using formula = 'y ~ x'
# Also, interactive Line Plot of Crime Trend
p2 <- ggplot(merged_monthly, aes(x = month, y = crime_count, group = 1)) +
geom_line(color = "steelblue", linewidth = 1.2) +
geom_point(color = "black", size = 2) +
labs(title = "Interactive: Monthly Crime Trend", x = "Month", y = "Crime Count") +
theme_minimal()
ggplotly(p2)
The second plot is a time series of monthly crime counts for 2024. This interactive line graph allows users to trace seasonal fluctuations, clearly highlighting July as the peak month—corresponding with the highest temperatures.
Through this analysis of Colchester’s 2024 crime and weather data, several patterns emerged that offer valuable insights into the relationship between environmental conditions, seasonal trends, and public safety. By linking crime frequency with monthly weather patterns—particularly average temperature, rainfall, and wind speed—and visualizing outcomes geographically and interactively, we have gained a clearer understanding of how external factors may influence crime rates and their spatial distribution.
The data strongly suggests that crime in Colchester increases with warmer weather. This trend is supported both visually and statistically. From the correlation matrix, average temperature and crime count share a moderate positive correlation of +0.42, meaning that as temperatures rise, so do incidents of crime .
Looking at the monthly breakdown, May recorded the highest number of crimes at 568 incidents, with an average temperature of 13.4°C, followed closely by July (608 incidents) and August (533 incidents), where temperatures reached 16.5°C and 18.1°C respectively . These findings are consistent with established criminological theories that associate warm weather with increased outdoor activity, social interaction, and hence, more opportunities for conflict or opportunistic crimes such as theft.
Conversely, cooler months like January (529 crimes, 4.2°C) and February (546 crimes, 7.7°C) reported comparatively lower crime rates. While the difference isn’t drastic in all cases, the upward slope in the time series plot visually confirms a seasonal rise and fall that aligns with temperature changes .
Yes—specific crime categories exhibit clear seasonal trends. Based on frequency tables and visual outputs, anti-social behaviour, violent crimes, and bicycle theft were all more common during the spring and summer months, especially from May to August.
For example:
Anti-social behaviour and public order offences peaked in July, coinciding with Colchester’s warmest months.
Bicycle thefts also rose notably during summer, likely due to increased cycling activity in fair weather. A closer inspection of the bar plots shows these categories surging just as temperatures hit their annual highs.
On the other hand, burglary, vehicle crime, and criminal damage showed a more uniform distribution across the year, suggesting these crimes are less dependent on seasonal variables and more influenced by other factors such as opportunity, socioeconomic conditions, or routine household activity.
These insights support the hypothesis that weather-sensitive crimes—those that occur more often in public spaces or require public presence—peak during warmer, more sociable months.
Geospatial mapping using Leaflet revealed distinct crime clusters within Colchester, especially concentrated in the town centre, including areas near the High Street and Hythe. These hotspots were particularly linked to crimes like shoplifting, anti-social behaviour, and public order offences—activities that are more likely to occur in high-footfall commercial and leisure zones .
In contrast, residential and peripheral areas exhibited fewer total crimes but were relatively more prone to burglary and property damage, which tend to occur in quieter neighborhoods with less surveillance and foot traffic.
The spatial analysis thus highlights how environmental context shapes crime patterns:
Urban hubs attract people-based crimes
Outskirts are more vulnerable to property-based offences
These findings reinforce the importance of context-aware policing, where foot patrols and preventive measures are adapted based on local population dynamics and urban design.
Rainfall, somewhat unexpectedly, also showed a positive correlation with crime (r = +0.58)—even stronger than the correlation with temperature. This is contrary to the common assumption that bad weather discourages outdoor movement and thereby reduces crime . One explanation might be that Colchester experienced high crime during certain high-rainfall months like February (92.6 mm rainfall) and May (80.6 mm rainfall), suggesting that specific incidents or crime types may not be deterred by precipitation, or that rain coincided with other social events or holidays.
Wind speed showed the opposite effect: a negative correlation with crime (r = -0.45). Windier months such as January and April coincided with lower crime counts, which may reflect the discomfort or visibility disruption that discourages outdoor or opportunistic criminal behavior.
These weather patterns demonstrate that while temperature is a strong driver, rainfall and wind can also influence public behavior and safety, albeit in more complex or situational ways.
The use of plotly added depth to the visual storytelling. The interactive scatter plot between crime and temperature allowed users to hover over data points and see precise values, which helped illustrate how months with 14°C+ temperatures consistently saw over 500 crimes.
The interactive time series plot further emphasized these peaks, with July and May standing out as the highest crime months. The dynamic view made seasonal crime trends immediately apparent to both technical and non-technical audiences, and could be a valuable tool for public presentations or community engagement.
The findings have practical implications for crime prevention and resource allocation in Colchester:
Seasonal Preparedness: Since crime rises in warmer months, police presence and community outreach could be increased between May and August. Resources like mobile units or outreach vans can be deployed more heavily during this period.
Targeted Awareness Campaigns: With bicycle theft peaking in summer, local councils could run awareness campaigns promoting bike locks and parking safety between June and September.
Area-Specific Policing: Knowing that the town centre is a hotspot for shoplifting, while residential outskirts face burglary, police could adjust patrol routes and install surveillance in risk-prone zones accordingly.
Urban Planning and Lighting: In areas where property crimes are frequent, improved street lighting, CCTV coverage, and neighbourhood watch programs could deter criminal activity.
Use of Weather Forecasts: Integrating weather data into policing systems could enable predictive patrol scheduling, especially when hot, dry weather is expected.
We explored the relationship between crime patterns and weather conditions in Colchester throughout 2024. By combining street-level crime data with daily meteorological records and applying a variety of data visualisation techniques, several key insights emerged.
The analysis revealed that crime in Colchester generally increased during the warmer months of the year. Notably, May and July stood out with significantly higher crime counts, supporting the idea that pleasant weather leads to more outdoor activity—and possibly more opportunities for crime to occur.
Certain types of crimes appeared to follow a clear seasonal pattern. Anti-social behaviour and bicycle theft were particularly common during the summer months. This likely reflects increased social interaction, outdoor gatherings, and the higher use of bicycles during this time, which makes them easier targets for theft.
Geographically, crime was not evenly spread across the town. It tended to cluster in specific locations, with central Colchester emerging as a hotspot. This area experienced more public-facing crimes such as shoplifting, possibly due to its busier streets, higher foot traffic, and concentration of retail businesses.
Interestingly, weather also seemed to influence crime levels in another way—rainfall may have a dampening effect on criminal activity. In the colder months, when rain was more frequent, overall crime appeared to decrease. This could be because fewer people were outside, reducing the likelihood of crimes happening in public spaces.
One unexpected finding was the sharp dip in crime during April, despite moderate temperatures. This anomaly may reflect external influences (e.g., school holidays, public events, or targeted policing campaigns), pointing to the importance of considering socio-political events alongside environmental data.
Also surprising was that burglary and criminal damage did not show strong seasonal shifts, suggesting they are driven by opportunity rather than weather or public activity levels.
To build on the findings from this project, several directions could be explored in future research for deeper and more precise insights.
Use more advanced models (like regression) to understand which weather factors matter most.
Study what times of day crimes happen to see if there’s a daily pattern.
Add demographic data to check if crime levels are related to things like income or housing.
Use social media or news to see how people feel about safety and compare that with real crime numbers.
https://bczernecki.github.io/climate/reference/meteo_ogimet.html
tidyverse, leaflet, ggplot2, ggcorrplot, etc.